\[ \begin{align*} y_{it} &\in\{0,1\} \\ l\LP{E\LPV{y_{it}}{\vect z_t, o_{it}}} &= \vect\beta^\top\vect z_{it} + o_{it} \end{align*} \] \(l\) is a link function, \(\vect x_{it}\) are know covariates, and \(o_{it}\) are known offsets.
\[ \begin{align*} P\LPV{\vect Y_t}{\vect X_t, \mat Z_t} &= g_t\LPV{\vect Y_t}{\vect X_t} & P\LPV{\vect X_t}{\vect X_{t-1}} &= f\LPV{\vect X_t}{\vect X_{t-1}} \\ P\LP{\vect X_1} &=f\LP{\vect X_1} \end{align*} \]
\[ P\LP{\vect y_{1:d}} = \int P\LP{\vect x_1} g_1\LPV{\vect y_1}{\vect x_1} \prod_{t = 2}^d g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\vect x_{t - 1}} \diff \vect x_{1:d} \] All implicitly depend on the parameters in the model.
\[ \begin{align*} l\LP{E\LPV{y_{it}}{\vect z_t, o_{it}, \vect x_t, \vect u_{it}}} &= \vect\beta^\top\vect z_{it} + o_{it} + \vect x_t^\top\vect u_{it} \\ \vect x_t &\sim \mathcal N\LP{\mat F\vect x_{t-1}, \mat \Sigma} \end{align*} \] \(\vect u_{it}\) are known firm covariates.
\[ \begin{align*} Q\LPV{\vect\theta}{\vect\theta^{(i)}} &= E\LPV{\log P\LP{\vect y_{1:d}}}{\vect\theta^{(i)}} \\ \log P\LP{\vect y_{1:d}} &= \log f\LP{\vect X_1} + \sum_{t = 2}^d \log \varphi\LP{\vect X_t;\mat F\vect X_{t-1},\mat \Sigma} \\ &\quad + \sum_{t = 1}^d\sum_{i \in R_t} \LP{h\LP{\eta_{it}\LP{\vect X_t}}T\LP{y_{it}} - A\LP{\eta_{it}(\vect X_t))} + B(y_{it})} \\ \eta_{it}\LP{\vect x_t} &= \vect\beta^\top\vect z_{it} + o_{it} + \vect x_t^\top\vect u_{it} \end{align*} \] where \(\varphi\) is a multivariate normal density function, \(R_t\) is the risk set at time \(t\), and \(h\), \(T\), \(A\), and \(B\) are known functions.
Need to evaluate
\[ E\LPV{\phi\LP{\vect X_{t-1:t}}}{\vect\theta^{(i)}}, \qquad E\LPV{\psi\LP{\vect X_t}}{\vect\theta^{(i)}} \]
Data set.
Models without frailty.
Models with frailty.
Want to approximate target density \(\tilde d(x) = \alpha d(x)\). Only know \(d\).
Use effetive sample size to judge
\[ESS = \LP{\sum_{i = 1}^N W_i^2}^{-1}\]
Yields a discrete approximation of distribution
\[\widehat P(X) = \sum_{i=1}^N W^{(i)} \delta_{x^{(i)}}(X)\] \(\delta\) is the dirac delta function.
\[ E\LP{\phi(X)} = \int \phi(x)\diff x \approx \sum_{i = 1}^N W^{(i)} \phi\LP{x^{(i)}} \]
Effective sample size is 29.59.
Effective sample size is 17.51.
\[ d(x) > 0 \Rightarrow q(x) > 0 \]
\[ \frac{d(x)}{q(x)} < C < \infty \]
Need
\[ P\LPV{\vect X_1}{\vect y_1} = \frac{g\LPV{\vect y_1}{\vect x_1}f\LP{\vect x_1}}{P\LP{\vect y_1}} \]
Use importance sampling to get
\[ \widehat P\LPV{\vect X_1}{\vect y_1} = \sum_{i = 1}^N \PF{W}{1}{i}\delta_{\PF{\vect x}{1}{i}}\LP{\vect X_1} \]
Also known as sequential monte Carlo.
Use \(\widehat P\LPV{\vect X_{1:t-1}}{\vect y_{1:t-1}}\) in
\[ \begin{align*} P\LPV{\vect x_{1:t}}{\vect y_{1:t}} &= P\LPV{\vect x_{1:t-1}}{\vect y_{1:t-1}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\vect x_{t-1}} }{P\LPV{\vect y_t}{\vect y_{1:t -1}}} \\ &\approx \sum_{i = 1}^N \PF{W}{t-1}{i} \delta_{\PF{\vect x}{1:t}{i}}\LP{\vect x_{1:t-1}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\PF{\vect x}{t-1}{i}} }{P\LPV{\vect y_t}{\vect y_{1:t -1}}} \end{align*} \]
\[ \PF{\vect x}{t}{i} \sim q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i},\vec y_t} \]
\[ \PF{\tilde W}{t}{i} = \PF{W}{t-1}{i} \frac{ g_t\LPV{\vect y_t}{\PF{\vect x}{t}{i}}f\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{i}} }{q_t\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{i},\vec y_t}}, \qquad \PF{W}{t}{i} \propto \PF{\tilde W}{t}{i} \]
Weights, \(\PF{W}{t}{i}\), can become concentrated.
Use auxiliary particle filter (Pitt and Shephard 1999) and resample with weights that are ideally
\[ \PF{\beta}{t}{i}\propto P\LPV{\vec y_t}{\PF{\vect x}{t-1}{i}}\PF{W}{t-1}{i} \]
Setting \(\PF{\beta}{t}{i} = \PF{W}{t-1}{i}\) yields the sequential importance resampling algorithm.
Different resample approaches can be used. See Douc and Cappe (2005).
Sample \(j_1,\dots,j_N\) indices using \(\{\PF{\beta}{t}{i}\}_{i=1,\dots,n}\).
\[ \PF{\vect x}{t}{i} \sim q_t\LPV{\cdot}{\PF{\vect x}{t-1}{j_i},\vec y_t} \]
\[ \PF{\tilde W}{t}{i} = \frac{\PF{W}{t-1}{j_i}}{\PF{\beta}{t}{j_i}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f \LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{j_i}} }{q_t\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{j_i},\vec y_t}}, \qquad \PF{W}{t}{i} \propto \PF{\tilde W}{t}{i} \]
Need \(P\LPV{\vect X_t}{\vect y_{1:d}}\) and \(P\LPV{\vect X_{t-1:t}}{\vect y_{1:d}}\).
Use generalised two filter smoothing formula from Briers, Doucet, and Maskell (2009) and smoother from Fearnhead, Wyncoll, and Tawn (2010).
Use the identity
\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} P\LPV{\vect y_{t:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &\propto \int f\LPV{\vect x_t}{\vect x_{t-1}} P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} P\LPV{\vect y_{t:d}}{\vect x_t} \end{align*} \]
Need to approximate \[ P\LPV{\vect y_{t:d}}{\vect x_t} = \ \int \prod_{k = t + 1}^d f\LPV{\vect x_k}{\vect x_{k-1}} \prod_{k =t}^d g\LPV{\vect y_k}{\vect x_k}\diff \vect x_{t + 1:d} \]
Let
\[ \gamma_1\LP{\vect x} = f\LP{\vect x}, \quad \gamma_t(\vect x) = \int f\LPV{\vect x}{\vect x_{t-1}} \gamma_{t-1}\LP{\vect x_{t-1}}\diff\vect x_{t-1} \]Then we introduce artificial probability densities
\[ \tilde P\LPV{\vect x_{t:d}}{\vect y_{1:d}} = \frac{ \gamma_t\LP{\vect x_t} \prod_{k = t + 1}^d f\LPV{\vect x_k}{\vect x_{k-1}} \prod_{k =t}^d g\LPV{\vect y_k}{\vect x_k} }{\tilde P\LP{\vect y_{t:d}}} \]
We can make a backward filter to sample \(\{\PB{\vect x}{t}{i}, \PB{W}{t}{i} \}_{i = 1, \dots, N}\) from \(\tilde P\LPV{\vect x_{t}}{\vect y_{1:d}}\).
We can show that
\[ \begin{align*} P\LPV{\vect y_{t:d}}{\vect x_t} &= \tilde P\LP{\vect y_{t:d}} \frac{\tilde P\LPV{\vect x_t}{\vect y_{t:d}}}{ \gamma_t\LP{\vect x_t}} \\ &\propto \frac{\tilde P\LPV{\vect x_t}{\vect y_{t:d}}}{ \gamma_t\LP{\vect x_t}} \approx \sum_{i = 1}^N \frac{\PB{W}{t}{i}}{\gamma_t(\vect x_t)} \delta_{\PB{\vect x}{t}{i}}(\vect x_t) \end{align*} \]
Thus,
\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &\approx \sum_{j=1}^N \PS{W}{t}{j}\delta_{\PB{\vect x}{t}{j}}\LP{\vect x_t} \\ \PS{W}{t}{j} &\propto \frac{\PB{W}{t}{j}}{\gamma_t\LP{\PB{\vect x}{t}{j}}} \sum_{i = 1}^N \PF{W}{t-1}{i}f\LPV{\PB{\vect x}{t}{j}}{\PF{\vect x}{t-1}{i}} \end{align*} \]
Use the identity
\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} P\LPV{\vect y_{t:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} g_t\LPV{\vect y_t}{\vect x_t} P\LPV{\vect y_{t+1:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &\propto \int f\LPV{\vect x_t}{\vect x_{t-1}} P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} g_t\LPV{\vect y_t}{\vect x_t} \cdot \\ &\qquad\int P\LPV{\vect y_{t+1}}{\vect x_{t+1}} f\LPV{\vect x_{t + 1}}{\vect x_t}\diff \vect x_{t+1} \end{align*} \]
Smoothing step is \(\bigO{N}\) but \(\bigO{n_tp}\) where \(n_t = \lvert R_t \rvert\) and \(p\) is the dimension of \(\vect X_t\).
Can generalize to handle singular components.
Doucet and Johansen (2011) and Givens and Hoeting (2012, chap. 6) has been used in prepreation of the slides.
Briers, Mark, Arnaud Doucet, and Simon Maskell. 2009. “Smoothing Algorithms for State–space Models.” Annals of the Institute of Statistical Mathematics 62 (1): 61. doi:10.1007/s10463-009-0236-2.
Douc, R., and O. Cappe. 2005. “Comparison of Resampling Schemes for Particle Filtering.” In ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005., 64–69. doi:10.1109/ISPA.2005.195385.
Doucet, Arnaud, and Adam M Johansen. 2011. “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later.” In Handbook of Nonlinear Filtering, edited by Dan Crisan and Boris Rozovskii, 741–67. Oxford: Oxford University Press.
Fearnhead, Paul, David Wyncoll, and Jonathan Tawn. 2010. “A Sequential Smoothing Algorithm with Linear Computational Cost.” Biometrika 97 (2). [Oxford University Press, Biometrika Trust]: 447–64. http://www.jstor.org/stable/25734097.
Givens, Geof H, and Jennifer A Hoeting. 2012. Computational Statistics. 2nd ed. Wiley.
Klaas, Mike, Mark Briers, Nando de Freitas, Arnaud Doucet, Simon Maskell, and Dustin Lang. 2006. “Fast Particle Smoothing: If I Had a Million Particles.” In Proceedings of the 23rd International Conference on Machine Learning, 481–88. ICML ’06. New York, NY, USA: ACM. doi:10.1145/1143844.1143905.
Pitt, Michael K., and Neil Shephard. 1999. “Filtering via Simulation: Auxiliary Particle Filters.” Journal of the American Statistical Association 94 (446). [American Statistical Association, Taylor & Francis, Ltd.]: 590–99. http://www.jstor.org/stable/2670179.
Comments
Smoothing step is \(\bigO{N^2}\) but independent of \(n_t = \lvert R_t \rvert\).
Can be reduced with approximate methods in Klaas et al. (2006). Yields \(\bigO{N\log N}\) run times. Alternatively, use rejection sampling if \(f\LPV{\vect x_t}{\vect x_{t-1}} / \gamma_t\LP{\vect x_t}\) is bounded.
Cannot handle singular components.